perm filename CHAP3.1[DIS,DBL] blob
sn#208269 filedate 1976-03-26 generic text, type T, neo UTF8
.SSEC(AM as Heuristic Search)
.SSSEC(Heuristic Search)
As the title of this section -- and this thesis -- proclaims, AM is a
kind of "heuristic search" program. That must mean that AM is
exploring a particular "space," using some informal evaluation criteria
to guide it.
Heuristic gourmets sometimes find it useful to classify heuristic searches in the
following way (depending on their flavor):
Spumoni:
The first kind of search paradigm is where each
operator merely adds more knowledge to a central base of facts. For
example, in theorem-provers, each operator is a rule of inference.
Applying an operator adds one more wff to the list of known true statements.
So in some sense ⊗4all⊗* paths lead to a "goal", the problem is to find a
short, direct path. A solution is "good" if it consists of a very short path
leading to an acceptable goal node.
The quality of the path is very
important.
Rocky road:
The second flavor of heuristic search is that of a guided
tree-search. For example, consider the typical chess program. After
applying an operator (like "castling"), a new node is created (a new
hypothetical board configuration), but if it is necessary to "back
up", then virtually none of the new nodes along that path will be
added to the system's central knowledge base. The task really is to
find ⊗4any⊗* path leading to the goal. A solution is "good" if it consists of
a path leading to an excellent goal node. The quality of the path is not very
important.
.SSSEC(AM's Similarities to Heuristic Search)
AM does the former kind of "self-adding" exploration. It contains, at
any moment, a collection of partially-understood concepts. Its
activities include understanding those concepts better, and
occasionally defining new concepts. AM thus continually enlarges a
body of mathematical knowledge. AM is spumoni-flavored.
If forced into it, one could say that the "nodes" of AM's space are
its partially-filled-out concepts (which include structures,
operators, and interesting relationships between other concepts), and
AM's "operators" are all its tactical rules for filling out those
concepts, and for creating new concepts. AM's "heuristics" of the
heuristic search paradigm are its remaining rules: the
ones which guide the system at a strategic level.
So we're forced to draw a line, separating AM's guiding rules into two
separate categories: trivial tactics versus high-level strategies.
This is painful and artificial, since in reality the rules are distributed
fairly evenly along a continuum.
.ONCE TURN ON "{}"
It would be nice to say that a "node" is simply what AM calls a concept,
but unfortunately
AM is complicated by the fact that the process of "creating and
filling out a new node" would then be very time-consuming. It's better to
only partially fill out newly-created nodes. In fact, the task of
deciding which nodes to fill out a little more -- and precisely which
facets of them to worry about filling out -- is itself handled as a
heuristic search (via the agenda mechanism. See Sections {SECNUM}.3
and {SECNUM}.4). The standard heuristic search paradigm treats
"growing a new node" as a primitive process.
.SSSEC(AM's Differences from Heuristic Search)
There are several difficulties and anomalies in forcing AM into the
heuristic search paradigm. For example, the operators which AM uses
to enlarge and explore the space of concepts are themselves
mathematical concepts (like composing two relations to produce a new
one). Thus AM should be viewed as a mass of knowledge which enlarges
⊗4itself⊗* repeatedly. As far as I know, all previous systems kept
the knowledge they "discovered" distinct from the knowledge used to
make new discoveries$$ Of course this is typically because the two
kinds of knowledge are very different: For a theorem-prover, the
first kind is "true predicate calculus statements", and the second is
"strategies for applying rules of inference". $.
Another difference between AM and other heuristic searchers is that
AM has no well-defined goal criteria. Its sole aim is to preserve the
interestingness level of the activities it performs, of the top tasks
on the agenda. We don't care what AM does -- or misses -- so long as
it spends its time on plausible tasks. There is no fixed set of
theorems that AM should discover (AM is thus not like a typical
⊗4problem-solver⊗*), no fixed set of traps AM should avoid (AM is
thus very different from ⊗4game-players⊗* like chess programs).
For
example, there is no stigma attached to the fact that AM never
discovered real numbers$$
There are many "nice" things which AM didn't -- and can't -- do:
e.g., devising
↓_geometric_↓ concepts from its initial simple set-theoretic
knowledge. Paul Cohen has indicated that
this would be a supremly exciting development: the invention of
geometry -- and all the tools it provides -- by a system which has
neither geometric intuition built into it, nor definitions of
geometric concepts. I do not think this possible. $;
it was rather surprising that AM managed to
discover ⊗4natural⊗* numbers! Even if it hadn't done that, it would be
acceptable if AM had simply gone off and developed ideas in set
theory.
.SSEC(The need for an agenda)
<< This section's heart is in the right place, but it needs
rewriting!! >
This section provides motivation for the following one, but is
virtually content-free and may be skipped if the reader is not
interested in the origins of the agenda scheme.
In a standard heuristic search, a particular operation is applied to
a particular node, and new nodes result. For exmaple, in chess, a
"node" might be a board configuration, and an "operation" might be
castling. In such searches, the action "create a new node X" is
trivial. It may eat up some memory space (e.g., storing a new board
position), but growing a new node rarely requires much "work", much
processing time.
In scientific concept formation, viewed as a heuristic search
process, the situation is quite different, however. A "node" is a
highly-strucured concept. We may wish to create it after we find out
just a couple of its facets (e.g., its definition and one inteesting
conjecture involving it). At that moment, we would have a
partially-filled-out new node. A few facets would have some entries
in them, but most facets would be blank. By the usual definition of
"creating a node", we would have to face, a this stage, about twenty
tasks, many of them time-consuming (e.g., fill in some examples of
this concept, fill in some generalizations of it,...). For each
blank facet, some time would have to be spent filling it in.
But most of these activities are not cost-effective. Too much time
would be spent, for too little reason. Clearly, we don't want to
generalize and specialize ⊗4every⊗* new concept$$ This would in fact
cause an infinite regress: When concept X is to created, we must
first find new concepts which are generalizations of X; but to create
such new concepts, we must first find generalizations of
↓_them_↓,...$. As soon as examples of a concept X are filled in,
some pattern may be perceived. At that moment, it is clearly better
to go off and explore that pattern, than to blithely continue on,
filling in all the other facets of X. As an extreme case, suppose
⊗4no⊗* examples of X were found. Then X might be vacuous, and it
would be absurd to spend time trying to specialize it. On the other
hand, there would then be good reason to try to ⊗4generalize⊗* X. In
fact, perhaps one of the best ways for AM to spend its time at that
moment would be to progressively generalize X until one of its
generalizations ⊗4did⊗* have some examples.
In other words, it seems reasonable that sometimes we'd like to quit
working on "fleshing out" the blank facets of a concept, in order to
explore some particular new concept, or in order to perform some
specific task. In each case, some good reasons will have appeared
which induce us to break away from the plodding "filling-out"
activities we were just engaged in. Specific reasons always have
priority over general inertia (focus of attention).
Computer science provides names for several schemes for accomplishing
this redirection of the flow of control:
.B04
(i) We could let the reasons act like little programs, and
⊗4↓_interrupt_↓⊗* each other. When a reason for doing X gained
control, the task X would be started. But then how do we cope with
several reasons pro and con a given activity? Also, when one task
has been done, how can we be sure where the best place to "return" to
is?
.OO
(ii) We could consider that we have one central program, but that it
can ⊗4↓_recurse_↓⊗* whenever a better task becomes evident. One can
then imagine that the control-stack would contain a list of
interrupted tasks, getting more and more worthwhile (better and
better reasons) as we proceed toward the current moment. The current
task can't be interrupted, except for the sake of performing an even
⊗4more⊗* interesting task. If the current task finishes, the normal
recursion process will take us back to finish the last task. But
recent discoveries may dramatically effect the worthwhileness of the
activities on the control stack; how could we reorder them? Within
the standard function-calling scheme, this can't be done.
.OO
(iii) It sounds like we might need to have a ⊗4↓_process_↓⊗* scheme,
involving many active tasks, which can go to sleep and wake up under
certain conditions. Recall that most of the reasons will not really
be in support of a task the size of "fill out a concept X", but
rather one of the size "fill out facet F of concept X". But these
tasks only use up a few seconds each (say averaging around 15
seconds). Is it really worth it to worry about interrupting one of
these tasks, once it's started? Is it worth using a hundred memory
cells to store the state of process that would only use up 5 more cpu
seconds if we let it wake up and continue? What if we have to keep
around a thousand partially-complete tasks? Probably not worth it.
.OO
(iv) Most of the tasks we have hanging around are fairly independent
of each other, so perhaps we could exploit the power of a
⊗4↓_multi-processor_↓⊗*. But we have thousands of little tasks, and
they spawn new ones. Well, if we only could run the program on a
10,000-processo machine (like the proposed Hypercube machine of <<get
this reference>)... Well, when one of these is built -- and good
systems software exists for it -- this may be worth considering.
Even so, the reults of experiment 2 (see Sec. 5.3) indicate that a
very important mechanism is the one whereby highly-rated new tasks
get suggested dynamicly, while the current task is executing. So the
gain in speed would only be a small factor (maybe one order of
magnitude -- not three). A much more standard criticism of such a
scheme is that it only chops the time by a contant amount (at best,
10,000), wheras the tasks grow exponentially.
.OO
(v) The most obvious scheme is to just ⊗4↓_execute them all_↓⊗*, in
some order. Even though we may as well look on each task as
indivisible, we simply can't afford to spend 15 cpu seconds executing
each one:$$ Please forgive the pragmatic discussion that is about to
occur, but the reader probably will sympathize with the need to worry
about cpu time and space. $ that would use up 100,000 seconds (days
of cpu time). For each new concept created, about 20 new tasks would
be proposed. The process of "adding a node" would thus use up 5 cpu
minutes of time.
.OO
(vi) Well, what do computers do when they have lots of tasks to do,
of varying priority? Often, they make out an agenda, a schedule of
things to do. The most important tasks are tackled first, and so on
down the line. Typically there will be a ⊗4↓_dynamic scheduler_↓⊗* to
pick the next task at each moment. Maybe our system should do the
same thing, namely schedule all the tasks, to choose them in order of
decreasing estimated worth. That is, pick the one with the best
reasons supporting it, and execute it. Repeat this Select-Execute
procedure over and over again. If a new, valuable task gets
suggested, it can be merged into the agenda. If the new task is even
more valuable than the current one, so what?! This will be a rare
occurrence, and anyway the current task will probably be nearly done
(so we give up an extra few seconds here and there).
.END
This explains the choice of an agenda as the control mechamism for
AM. We have glossed over the interesting problem of how to
intelligently choose which task is the best one to execute next. The
details of how that scheme works is described in the next section.
.SSEC(The Agenda)
.AGENDASEC: SECNUM
.AGENDASSEC: SSECNUM
.AGENDA:
The time has come to discuss the agenda mechanism that AM uses. At
any moment, AM has many (perhaps hundreds or even thousands) tasks
which have been suggested for some reason or other, but haven't been
carried out yet. Each task is at the level of filling in a certain
facet of a certain concept. Recall that each task also has tacked
onto it a list of symbolic reasons explaining why the task is worth
doing. In addition, a number (between 0 and 1000) is attached to
each reason, representing some absolute measure of the value of that
reason (at the moment). One global formula combines all the reasons'
values into a single priority value for the task as a whole. To
reiterate, a task's priority is based on the worths of the reasons
attached to the task. So a typical entry on the agenda might look
like this:
.BEGIN NARROW 0,10
.BBOX
MBOX TASK: Fill-in examples of Sets $
MBOX PRIORITY: 300 $
MBOX REASONS: $
MBOX 100: No known examples for Sets so far. $
MBOX 100: Failed to fillin examples of Set-union, for lack of examples of Sets $
MBOX 200: Focus of attention: AM recently worked on the concept of Set-union $
.EBOX
.END
<<This paragraph is out of place, or not written well: >
The three reasons each have a fairly low priority, and the total
priority of the task is therefore not great. It is, however, higher
than any single reason. This is because there are three separate
reasons supporting it. The global formula uniting these reasons'
values must not simply take the largest of them (ignoring the rest),
nor should it simply add them up (a few mediocre reasons aren't
really better than one excellent reason). We'll return to this issue
in a moment.
Notice the similarity of this to the initial few lines which AM types
just after it selects a job to work on:
.BEGIN NOFILL PREFACE 0 TURN OFF "{}" TURN ON "↑↓\" TABS 18,21 SELECT 3
***Task 2.
Filling in examples of the following concept: "Sets".
3 Reasons:\(1) No known examples for Sets so far.
\(2) Failed to fillin examples of Set-union, for lack of examples of Sets.
\(3) Focus of attention: AM recently worked on Set-union.
.E
The flow of control is quite trivial at the top level: AM picks the
task with the highest priority value, and tries to execute it. As a
side effect, new entries may get added to the agenda while the task
is being executed. The global priority value of the task also
indicates how much time and space this task deserves. The sample task
above might rate 20 cpu seconds, and 200 list cells. When either of
these resource quanta is used up, AM terminates work on the task, and
proceeds to pick a new one. In the case of filling in examples of
sets, the space allowed will be used up quickly.
The big question now is "How does AM execute a task, once it's
chosen?" Several smaller issues also must be discussed:
.BN
λλExactly what can be done during a task's execution, and what must
be formulated as a new task to be executed sometime in the future?
λλHow do plausible new tasks get suggested?
λλWho thinks up the reasons supporting a task, and how does he do it?
λλWho evaluates each reason, assigning it an absolute numeric rating?
How?
λλDo these ratings change? When and how?
λλWho combines the reasons' values, and comes up with a single
priority value for the task? How?
λλDoes a task's priority value change? When and how?
λλWhat are the dangers -- philosophical and empirical -- of relyig on
absolute numeric ratings?
λλHow "tuned" is the system to depending on particular numeric values
and formulae? How fragile/robust is it?
.END
.COMMENT When DIFSECNUM... is available, ONCE TURN ON "{}" here;
These will all be discussed in the next few sections, and illustrated
in the next chapter. More discussion of difficulties and limitations
of these ideas can be found in Section {[2] DIFSECNUM}.{[1]
DIFSSECNUM}, on page {[3] DIFSEC}.
.SSEC(Executing a chosen task)
Okay, so we've selected a task to execute
-- say "Fill-in Examples of Primes".
How exactly do we do this?
The answer is: "We select relevant heuristics, and execute them."
This really justs splits our original question into two new ones:
(i) How are the relevant heuristics selected, and (2) What does it mean for
heuristics to be executed (e.g., how does executing a heuristic rule
help to fill in examples of primes?).
.SSSEC(Executing heuristic rules)
The next section will describe how the relevant heuristics are
gathered up in an efficient manner. For the moment, let's assume
we've done that. What do the heuristics look like, and how do they
carry out the desired task?
A typical heuristic, attached to the concept Operation, says:
.GROUP B816
If the task is to fill in examples of the operation F,
One way to get them is to run F on randomaly chosen examples of the
domain of F.
.E
Of course, in the LISP implementation, this situation-action rule is
not coded quite so neatly. It would be more faithfully translated as
follows:
.B816
If CURRENT-TASK matches (FILLIN EXS F←anything)), and F is an
Operation,
Then carry out the following procedure:
.INDENT 12,16,0 PREFACE 0 BNN←0
1. Find the domain of F, and call it D;
2. Find examples of D, and call them E;
3. Find an algorithm to compute F, and call it A;
4. Repeatedly:
.INDENT 16,16,0
4a. Choose any member of E, and call it E1.
4b. Run A on E1, and call the result X.
4c. Check that X satisfies the definition of F.
4d. Add X to the Examples facet of F
.E APART
In this case, we see clearly that a heuristic rule really can be
"executable", and that executing it really can produce the desired
kinds of entities. The heuristic is seen to be a goal-directed
(data-directed, pattern-directed, etc.) function call.
Well, this is all fine, but we'll have hundreds -- maybe thousands --
of heuristics; we can't afford to run the If-parts (the left hand
sides, the situation-test, etc.) of each one each time a new task is
chosen. The next subsection discusses how AM is able to zero in on
the relevant heuristic rules.
.SSSEC(Gathering the relevant heuristics)
Recall that there is are pointer structures with labels like Genl, Spec,
Exs, Up. For example, the concept Primes might point to the concept
Numbers with a link labelled Genl, and Numbers would point to Primes
with an arc labelled Spec, since Primes are special kinds of Numbers.
By calling for Primes.Genl, we might recover the list (Numbers).
By calling for Numbers.Genl, we might recover the list (Object).
Repeating this process, we eventually will reach some very general
concept like "Anything". If we join together all the lists we find along the
way, we could be said to have ⊗4rippled⊗* away from Primes, in the Genl direction.
For example, here is a diagram which represents rippling away from the concept
Compose-Intersect&Union. Diagonal lines are Genl/Spec, and vertical lines are
Exs/Up. Near the top, many concepts are connected by both kinds of links.
.B0
Anything
\ ~
\ ~
Any-concept
\ ~
\ ~
Any-activity
\ ~
\ ~
Operation
/ \ ~
/ \ ~
/ Composition
/ \
/ \
Dom⊗1=⊗*Range-op Compose-op-⊗1&⊗*-itself
~ ~
~ ~
Compose-Intersect⊗1&⊗*Intersect
.E
<<This para -- and section -- must be rewritten: >
In case you're interested, a concept A is a ⊗4specialization⊗* of a concept
B iff A could be defined as "B.DEFN and...", where B.DEFN is some definition
of B, and the ellipsis can be filled in with any predicate.
Concept A is an ⊗4exmmple⊗* of concept
B iff A satisfies some definition
of B. So these two ideas are not mutually exclusive.
"Composition" is a valid example of an operation, since it satisfies the definition
of Operation. On the other hand, Compositions form a subclass of all operations;
to be a composition, X must be an operation and also must staisfy some stronger
requirements. So Composition is also a specialization of Operation.
The important idea here is to see that the bottom concept is connected to the
other ones in a natural chain-like way. You could start at the bottom and
"ripple" your way upward, by following the Up and Genl links.
Now each concept has facets which contain some heuristics. Some of these are for
filling in, some for checking, some for deciding the interestingness, etc.
For example, one Interest feature of Compose says that a composition is
more interesting if the range of the first argument equals the domain of the
second, and that a composition is meaningless if those two sets don't even
intersect.
Suppose we want to judge the interestingness of the bottom concept. We ripple
upwards, gathering heuristics as we go. Several of these have to do with why
a Composition of any kind might be interesting; several have to do with why
any operation might be intresting, etc.
Or, suppose we want to fill in examples of that bottom concept. Again we
ripple upwards. Compose explains how to use examples of its arguments to
find examples of the composition. Operation explains how to "run" the concept
on randomly chosen domain elements, to derive examples that way.
Anyb explains how to instantiate the definition of the concept, to symbolicly
construct examples directly from the definitions of Compose-Intersect&Intersect.
In general, the suggestions of the more specific concepts will be better than
those of the general connepts. This is the old generality VS power tradeoff.
So AM begins executing the heuristic rules, rippling upward, and quits after
it's used up its time quantum (its activation energy) for the current task.
If it doesn't make it all the way up to Anything, that's no great loss, since
Anything has such general information that it's practically never practical to
use it. (e.g., to fill in examples of Anything, pick random objects and see if
they satisfy the definition).
These linkages serve another purpose, besides providing an implicit framework
to guide the gathering of heuristic rules. Suppose one rule asks for
examples of Operations, during its course of execution. How can AM quickly
find all examples of operations? Well, a reasonalbe answer is to ripple
downward along the Spec direction for some time (perhaps not at all), then
go down the Exs link (only once), and then ripple along Spec again as far
as you please. Graphically,
.B0
C1
\
\
\ ⊗4any number of Spec links⊗*
...
\
\
C2
~ ⊗4one Exs link⊗*
C3
\
\
\ ⊗4any number of Spec links⊗*
...
\
\
C4
.E
C1 might coincide with C2, and C3 with C4.
Using the simple diagram above, the only examples of Operation which are
pictured are Compose and Compose-Intersect&Intersect. The latter is
plucked in two distinct ways, in fact.
So the links are used both to guide the gathering of heuristics, in order to
actually excute a task once it's been chosen from th agenda, and also as
efficient ways to gather examples, specializaitons, etc. of a given concept.
.SSEC(Creating New Concepts)
As we've discussed before, the "right-hand-side" of a rule can do
three kinds of things:
.BN
λλ Fill in some entries for a facet of a concept.
λλ Create a new concept.
λλ Suggest new tasks and add them to the agenda.
.E
Section 3.4 explained how a heuristic rule might find entries for a
given facet of a given concept. Below, we shall illustrate the second
kind of action a heuristic rule may take: creating a new concept.
.SSSEC(What to do at the moment of creation -- and what to defer)
The reader may be glad to learn that this is in some sense a trivial
action. A concept is nothing more than a bunch of Facet/Entries
pairs (i.e., a bunch of labels which have some information attached
to them). The typical action of a heuristic rule is to fill in
entries for a given facet of a given concept (see Section 3.4). So
all we need do to create a new concept C is to tag the appropriate
data structures (e.g., if there is a global list of all concepts,
then we must add this new name, C, to that list), and to suggest
several new jobs (⊗6"Fill-in the definition of C", "Fill-in examples
of C"⊗*,...). Idealistically, we need fill in nothing at the moment
of C's birth.
Of course, there are two key reasons for filling in as many parts as
we can, at the time of creation:
.BN
λλ Several pieces of information are trivial to obtain at this
moment, but may be hard to reconstruct later (e.g., the reason why C
was created).
λλ In practice, it is impossible to work with a concept unless enough
is known about it to disambiguate it from all others (e.g., its
definition, or its algorithm, or an intuition, or several examples,
etc.).
.E
There are of course two quite general reasons for never filling in
anything without a good reason:
.BN
λλ To fill in any facet of any concept will use up some precious time
and space.
λλ Often the process of filling in a facet can be explosive, can
create new concepts which will ⊗4also⊗* need that particular facet
filled in.
.E
So the universal motto of AM is to fill in facets of a new concept if
-- and only if -- that filling-in activity will be nonexplosive, easy
at the moment, and much harder later on.
In almost all cases, the following facets will be filled in right
away: Definitions, Algorithms, Domain/range, Worth, plus a tie to
some related concept (e.g., if the new concept is a generalization of
Equality, then we can trivially fill in an entry on its
Specializations facet: "Equality".)
In almost all cases, the following facets will ⊗4not⊗* be immediately
apparent: Examples, Generalizations, Specializations, Ties, and
Interestingness.
So AM will typically fill in the former kinds of facets, and relegate
filling in the latter facets to separate tasks which it adds to the
agenda, and which AM may or may not eventually get around to trying.
Thus the agenda will contain several jobs of the form ⊗6"Fill-in
examples of..."⊗*, but very few of the form ⊗6"Fill-in the definition
of..."⊗*.
The actual algorithm for doing this is quite simple. AM spends a
small amount of time trying to fill in each part of C mentioned in
the heuristic rule. AM quits working on a part if some entries don't
start materializing quickly. Whenever a part is "given up" on, a new
task is added to the agenda, to fill in that part of the new concept
C. Also, any facet which is not mentioned in the rule will also have
a new job added to the agenda, to fill it in. The worth of these
jobs will be fairly low, unless there is some specific reason to the
contrary. The precise priority values will depend on the the overall
worth of the facet involved, plus the oWorth facet of the new
concept. The facet EXAMPLES is rated much higher than
SPECIALIZATIONS, because looking for examples of a concept is often a
good expenditure of time, producing the raw data on which empirical
induction thrives. Specializations of the new conept would be other
brand new concepts, and thus filling in that facet would be a very
explosive process.
.SSSEC(An illustration: Squaring a number)
Let's take a simple (but not atypical) illustration of all this. AM
has recently discovered the concept of multiplication, and decides
that it is very interesting. A heuristic rule exists which says:
.B816
If a newly-interesting operation F(x,y) takes a pair of N's as
arguments,
Then create a new concept, called F-Itself, taking just one N as
argument, defined as F(x,x), with initial worth Worth(F).
.E
In the case of F = TIMES, we see that F takes a pair of numbers as
its arguments, so the heuristic rule would have AM create a new
concept called TIMES-Itself, defined as TIMES-Itself(x) ≡ TIMES(x,x).
That is, the operation of squaring a number.
What would AM do in this situation? The global list of concepts
would be enlarged to include the new atom "TIMES-Itself", and the
facets of this new concept would begin to be filled in. The
following facets would get filled in almost instantly:
.BBOX
MBOX $
MBOX NAME: TIMES-Itself $
MBOX $
MBOX DEFINITIONS: $
MBOX ORIGIN: λ(x,y) [TIMES.DEFN(x,x,y)] $
MBOX $
MBOX ALGORITHMS: λ(x) [TIMES.ALG(x,x)] $
MBOX $
MBOX DOMAIN/RANGE: Number α→ Number $
MBOX $
MBOX GENERALIZATIONS: TIMES $
MBOX $
MBOX WORTH: 600 $
MBOX $
.EBOX
The name, definition, worth, and domain/range are specified
explicitly by the heuristic rule.
The lambda expression stored under the definition facet is an
executable LISP predicate, which accepts two arguments and then tests
them to see whether the second one is equal to TIMES-Itself of the
first arguement. It performs this test by calling upon the predicate
stored under the definition facet of the TIMES concept. A trivial
transformation of this provides an algorithm for computing this
operation.
The worth of TIMES is 600 at the moment, and this becomes the worth
of TIMES-Itself.
TIMES-Itself is by definition a specialization of TIMES, so the
SPECIALIZATIONS facet of TIMES is incremented to point to this new
concept. Likewise, the GENERALIZTIONS facet of TIMES-Itself points
to TIMES.
Note how easy it was to fill in these facets now, but how difficult
it might be later on, "out of context". The task of, e.g., filling in
⊗4Specializations⊗* of TIMES-Itself will be no harder later on than
it is right now, so we may as well defer it until there's a good
reason for it. This task will probably be added to the agenda with so
low a priority that AM will never get around to it, unless some new
reasons for it emerge.
The task ⊗6"Fill-in examples of TIMES-Itself"⊗* is probably
worthwhile doing soon, but again it won't be any harder to do at a
later time than it is right now. So it is not done at the moment;
rather, it is added to the agenda (with a fairly high priority).
Incidentally, the reader may be interested to know that the next few
tasks selected would create the inverse of this operation (i.e.,
integer square-root), and then create a new kind of number, the ones
which can be produced by squaring (i.e., perfect squares). Perfect
squares are worth having because Integer-square-root is defined
precisely on that set of integers.
.SSSEC(The Intuition for Squaring)
The notion of "Intuition" is so strange that I shall take every
opportunity to try to explicate it. This subsection is a digression
to that purpose, and may therefore be skipped forever, returned to
later, or read right now.
Consider what happens when the task ⊗6"Fill-in intuitions of
TIMES-Itself"⊗* gets chosen. Rippling upward, we find that the
intuition for TIMES is a rectangle (in the plane) built out of unit
squares, where the length of each side of the rectangle represents
the two arguments to TIMES, and the area of the rectangle (i.e., the
rectangle itself) represents the value of TIMES. For the current
operation, this translates to a rectange whose sides are all of equal
length: i.e., a square. If AM is given this notion explicitly, it
will actually suggest renaming the TIMES-Itself operation "Square".
Notice the nice intuitive image that this will cause for
square-rooting, since a number n is represented as a pile of n 1x1
squares. To take the square-root of a number, we must arrange that
pile into one large square, and then see how long one side is.
This also suggests several trivial relationships to verify. Since the
number 1 is intuited as a 1x1 square, it seems intuitively clear to
AM that Times-Itself(1) will be equal to 1. This is then checked
"officially" (using the definition of Times-Itself) and verified.
Similar considerations suggest that Times-Itself(0) will equal 0, and
that in general Times-Itself(x) is much larger than x.
Although AM never noticed it, this intuition is just the right one to
notice that to get from n⊗A2⊗* to n+1⊗A2⊗*, it is necessary to add
2n+1 unit squares: i.e., that n⊗A2⊗* + 2n + 1 = n+1⊗A2⊗*. This is a
far from trivial result to notice, when one doesn't know about
algebra. In fact, AM couldn't verify such a general conjecture, since
it knows neither the concept of mathematical induction, nor anything
about proof in general.
.SSEC(Proposing New Tasks)
Sections 3.4 and 3.5 explained how a heuristic rule can fillin entries
for a certain facet of a concept, and how a rule might result in a new
concept being created. This section explicates the third kind of activity
which a heuristic rule can initiate: suggesting new jobs, and adding them
to the agenda.
A few more of the questions raised insection 3.3 can then be answered:
.BEGIN INDENT 4,0,0 PREFACE 0
how do new tasks get suggested?
how do the reasons for the tasks get thought up?
how do those reasons get rated?
.END
<This kind of side-affect is perhaps the SIMPLEST of the 3 kinds that
heuristic rules can produce; perhaps it should be discussed sooner? >
A heuristic rule will have the form
.B816
If ⊗4<some empirical situation>⊗*...
Then ⊗4<some things to do>⊗*...
.E
Among the "things to do" are suggestions for future tasks. These suggestions
are then simply added to the agenda list. That's it!
Each such suggestion consists of two parts: the job itself, plus reasons for
why this job should be done. A typical job has the form ⊗6"Fill in
examples of Perfect-squares"⊗*. Each reason supporting a job contains an
English sentence (an opaque string, a token) and a rating
(a number between 0 and 1000). One global formula will combine all the reasons'
ratings into one single priority value for the task.
For example, one rule says
.B816
If very few examples of X are found,
Then add the following task to the agenda: "Generalize the concept
X", for the following reason: "X's are quite rare; a slightly less
restrictive concept might be more interesting"; this reason's rating is
the ratio of nonexamples/examples found.
.END
Of course, the situation part (left-hand-side) of the rule would really look
more like this:
.BEGIN SELECT 6 NOFILL PREFACE 0 INDENT 4
If the current task was (Fill-in examples of X),
and X is a predicate,
and more than 100 items are known in the domain of X,
and at least 10 cpu seconds were spent trying to randomly instantiate X,
and the ratio of successes/failures is both >0 and less than .05
Then...
.E
Even this is one full step above the actual implementation, where
⊗6"X is a predicate"⊗* would be coded as
⊗6"X ε EXS(PREDICATE)"⊗*. The function ⊗6EXS(X)⊗* looks at
the examples facet of its argument X, at specializations of those examples, etc.,
and recursively includes ⊗6EXS( SPECIALIZATIONS (X) ).
The most suspicious part of this trigger is the number ".05". Where did it come
from?
It was chosen as a number which is "very much less" than
the "ideal" fraction of examples which should pass a predicate.$$
Ah, well this ideal fraction is clearly at least .1, and at most .9, else
almost every argument is getting the same reply. Well, what is that ideal fraction?
Does it even make sense to say it existss, invariant over all predicates, in all fields
of human endeavor? Of course not; this leads even to formal contradictions, let
alone aesthetic ones! Well, by symmetry we could settle on 0.5; a figure closer
to 0.1 is
arrived at by introspection (that's right, a gut-reaction guess), and would no
doubt be .11111..., if humans had nine fingers instead of ten.
In any event, we shall say the the ideal predicate returns True about 10% of the
time. Thus a predicate returning True less than half this often is too "strict".
Hence the .05 $
That remark was of course tongue-in-cheek, and the only honest
justification for .05 is that you may change it (to .01, or to .25) with virtually
no change in AM's behavior$$ This is the conclusion of experiment 5 (see Section
6.4) $.
The English string is treated simply as a token; that is, AM is not allowed to
inspect parts of it, parse it, transform it, etc.
The most AM can do is compare two such tokens
for equality. Of course, it would not be too hard to extend this capability to
permit AM to syntactically analyze such strings,
or to trivially compute some sort of "difference" between two given reasons.
The heuristic rule also contains
a little program for computing the numeric worth of this reason (at any given
moment). The sample rule above contains a rather simple
formula for computing the worth of that reason.
In actuality, this would be checked to ensure that the result lies between
0 and 1000.
It might be advisable to tack this formula onto the agenda list, instead of just
evaluating it once and tacking on the value returned.
The advantage of the latter approach is to save a great deal of space; the
former procedure would be more "intelligent":
Say that dozens of examples of the concept are found in between the time
the task was proposed, and the time AM selects it to execute it.
Those new examples would change the ratio of exs/non-exs, hence the priority
of the reason, hence the priority of the task itself.
An experiment should be done to judge the effect of this on AM's behavior.
Unfortunately, this experiment has not been done, and AM is stuck with the
simpler notion of staticly evaluating the little programs only once, at the time
a task is added to the agenda.
.SSSEC(The Ratings Game)
In general, a heuristic rule can have several reasons in support of a job it
suggests. Each reason has an English phrase and a numeric-valued formula
associated with it. Also, the task may already exist on the agenda, where it:
can have several associated reasons supporting it.
One global formula looks at all the ratings for the reasons, and combines them
into a single priority value for the task as a whole. This oveall rating is
used to choose which task on the agenda to execute next, and how much
time and space to expend on it before quitting.
Below is that formula, although I hesitate to even present it:
.BEGIN NOFILL INDENT 4 SELECT 6 TURN ON "&↑↓"
Worth(J) = ||SQRT(SUM R↓i&↑2)|| ⊗7x⊗* [ .2⊗7x⊗*Worth(A) + .3⊗7x⊗*Worth(F) + .5⊗7x⊗*Worth(C)]
Where J = job to be judged = (Act A, Facet F, Concept C)
and {R↓i} are the ratings of the reasons supporting J.
.END
For example, consider the job J = (Check examples of Primes).
The act A would be "Check", which has a numeric worth of 100.
The facet F would be "Examples", which has a numeric worth of 700.
The concept C would be "Primes", which at the moment might have Worth of 800.
Say there were four reasons, having values 200, 300, 200, and 500.
The double lines "||...||" indicate normalization, which
means that the final value of the square-root must be between 0 and 1,
which really just means dividing the result of the Square-root by 1000 always.
In this case, we first compute Sqrt(200↑2 + 300↑2 + 200↑2 + 500↑2) = Sqrt(420,000)
which is about 650. After normalization, this becomes 0.650.
The expression in square brackets is actually computed as the
dot-product of two vectors,
in this case it is the dot-product of (100 700 800) and (.2 .3 .5), which yields
630. This is multiplied by the normalized
Square-root value, 0.650, and we end up with a
final priority rating of 400.
THe above formula was intened originally as a first pass,
an ad hoc guess, which I expected I'd have to modify later.
Since it has worked successfully,
I have not messed with it, The conclusion is that the particular form of the
formula is unimportant; only some general characteristics need be present:
.BN
λλ The value is at least as big as the value of the best reason.
λλ The more distinct reasons that exist, the higher the value.
λλ The value isn't increased if the same reason is present more than once.
λλ The value is greatly increased if two good but very different reasons exist.
.E
I believe that all of these criteria are absolutely essential to good
behavior of the system.
Experiments 1,3, and 5 bear on this question$$ See Section 6.4, page 42 $.
.SSEC(Summary: Agendas and Heuristic Search)
<< Why the agenda mechanism is well-suited to heur search algorithms >
Consider Nilsson's description of depth-first searching, and of
breadth-first searching. He has us maintain a list of "open" nodes.
Repeatedly, we pluck the top one and expand it. In the process,
some new nodes may be added to the Open list. In the case of depth-first
searching, they are added at the top; for breadth-first search, they must be
added at the bottom. For heuristic search, or "best-first" search, they are
evaluated in some numeric way, and then "merged" into the already-sorted
list of Open nodes.
This process is clearly analogous to the agenda mechanism AM uses.
It is also the same as the one used in KRL [reference].
The main difference is that in AM, symbolic reasons are used
(albeit in trivial token-like ways) to decide whether -- and how much --
to boost the priority of a task when it is suggested again.
Agendas are the canonical kind of data structure for a "best-first" process.